Variable-length acoustic units inference for text-to-speech synthesis

نویسنده

  • Olivier Boëffard
چکیده

The best voices in text-to-speech synthesis are currently obtained via acoustic units concatenation-based systems. In such systems, the choice of units whose concatenations will produce an acoustic message is a crucial stage. Moreover, it can be observed that current TTS systems use acoustic units which most often correspond to variable-length phonetic descriptions. In this article, an original framework is proposed which allows the automatic determination of an optimum set of variable-length acoustic units.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inference of variable-length linguistic and acoustic units by multigrams

The efficiency of pattern recognition algorithms is highly conditioned to a proper definition of the patterns assumed to structure the data. The multigram model provides a statistical tool to retrieve sequential variable-length regularities within streams of data. In this paper, we present a general formulation of the model, applicable to single or multiple parallel strings of data having eithe...

متن کامل

Inference of variable-length acoustic units for continuous speech recognition

In the eld of speech recognition, the patterns assumed to structure the speech material (phonemes, triphones, words...) are de ned a priori according to a linguistic criterion, whereas the recognition criterion is based on an acoustic similarity measure. From this may result a lack of consistency for the recognition units. In this paper, we explore the possibility of a more data-driven approach...

متن کامل

Audio-Visual Unit Selection for the Synthesis of Photo-Realistic Talking-Heads

This paper investigates audio-visual unit selection for the synthesis of photo-realistic, speech-synchronized talking-head animations. These animations are synthesized from recorded video samples of a subject speaking in front of a camera, resulting in a photo-realistic appearance. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mo...

متن کامل

Cantonese text-to-speech synthesis using sub-syllable units

This paper describes our recent investigation on the use of both intra-syllable and cross-syllable acoustic units for Cantonese text-to-speech synthesis. In our previous work, isolated monosyllable units were used for concatenative speech synthesis of Cantonese. The synthetic speech was considered to be unnatural in such a way that there was an obvious lack of perceptual continuity. The propose...

متن کامل

Natural-sounding Speech Synthesis Using Variable-length Units1

The goal of this work was to develop a speech synthesis system which concatenates variable-length units to create naturalsounding speech. Our initial work in this area showed that by careful design of system responses to ensure consistent intonation contours, natural-sounding speech synthesis was achievable with wordand phrase-level concatenation. In order to extend the flexibility of this fram...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001